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We performed a genome-wide association study (GWAS) and 
a multistage meta-analysis of type 2 diabetes (T2D) in Punjabi 
Sikhs from India. Our discovery GWAS in 1,616 individuals (842 
case subjects) was followed by in silico replication of the top 513 
independent single nucleotide polymorphisms (SNPs) (P < 10 
in Punjabi Sikhs (n = 2,819; 801 case subjects). We further repli- 
cated 66 SNPs (P < 10"^) through genotyping in a Punjabi Sikh 
sample (n = 2,894; 1,711 case subjects). On combined meta- 
analysis in Sikh populations (n = 7,329; 3,354 case subjects), 
we identified a novel locus in association with T2D at 13ql2 
represented by a directly genotyped intronic SNP (rs9552911, 
P = 1.82 X 10"^) in the SGCG gene. Next, we undertook in silico 
replication (stage 2b) of the top 513 signals (P < 10"^) in 29,157 
non-Sikh South Asians (10,971 case subjects) and de novo geno- 
typing of up to 31 top signals (P < 10~ ) in 10,817 South Asians 
(5,157 case subjects) (stage 3b). In combined South Asian meta- 
analysis, we observed six suggestive associations (P < 10~^ to < 
10" 0, including SNPs at HMGILI/CTCFL, PLXNA4, SCAP, and 
chr5pll. Further evaluation of 31 top SNPs in 33,707 East Asians 
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(16,746 case subjects) (stage 3c) and 47,117 Europeans (8,130 case 
subjects) (stage 3d), and joint meta-analysis of 128,127 individuals 
(44,358 case subjects) from 27 multiethnic studies, did not reveal 
any additional loci nor was there any evidence of replication for the 
new variant. Our findings provide new evidence on the presence of a 
population-specific signal in relation to T2D, which may provide addi- 
tional insights into T2D pathogenesis. Diabetes 62:1746-1755, 2013 




South Asians (people originating from the Indian 
subcontinent) comprise more than a quarter of 
the global population and contribute the highest 
number of patients with type 2 diabetes (T2D) (1). 
According to latest estimates, —61 million people in India 
alone are currently afflicted with T2D, and their number 
is projected to increase to —101 million by 2030 (2). Con- 
sequently, —60% of the world's coronary artery disease 
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(CAD), a principal cause of mortality in individuals with 
T2D, is expected to occur in India (3). There is consider- 
able ethnic difference in the prevalence and progression 
of T2D and CAD. In addition to environmental factors, 
genetic factors influence disease susceptibility (4). The 
incidence of T2D and CAD is about three to five times 
higher in immigrant South Asians compared with Euro- 
Caucasians, and the age of onset of T2D is roughly a 
decade earlier in South Asians than in Europeans (5-7). 
The higher prevalence of T2D among South Asians settled 
in developed countries compared with the host popula- 
tion reflects the genetic and ethnic predisposition to 
cardiometabolic disease under an adverse environment 
and the joint effects of genes and environment in the 
predisposition to T2D (8). For these reasons, we con- 
ducted ethnic-specific genetic studies in a Sikh population 
to dissect genetic pathways that may contribute to T2D 
etiology in different ethnic groups. 

The vast majority of genome-wide association studies 
(GWAS) on T2D so far have been performed on Europeans. 
Studies on non-European populations, especially those with 
unique demographic and cultural histories, are important 
for identifying population-specific linkage disequilibrium 
(LD) patterns and environmental factors that may modulate 
disease risk or protection (9). Interestingly, many, but not 
aU, of the common loci originally identified in Europeans 
have been replicated in non-European groups (10-18). Re- 
cent GWAS in non-European populations have yielded in- 
triguing new variants (19-21), including six novel signals 
in South Asians represented by single nucleotide poly- 
morphisms (SNPs) near GRB14, ST6GAL1, VPS26A, 
HMG20A, AP3S2, and HNF4A in our recent meta-analysis of 
GWAS (22). Given the existence of marked genetic vari- 
ability among South Asian communities, in addition to 
diversity in culture, language, caste system, physical ap- 
pearance, and diet, they do not constitute a single homoge- 
neous community (23). Therefore, screening populations 
with a different genetic and racial background or environ- 
mental exposures may improve insights about the disease 
and genetic risk factors (24). 

People from India have a complex racial history com- 
plicated by the presence of a caste system that has pro- 
hibited interbreeding to a great extent and consequently 
separated people into numerous endogamous groups (25). 
The Sikhs, a relatively young, inbred population of —26 
million (2% of the Indian population), are from the north- 
western province of India and follow a distinct and unique 
religion bom —500 years ago in Punjab. They have an in- 
teresting background for "nontraditional" disease enrich- 
ment in the absence of conventional risk factors such as 
smoking, obesity, and a diet rich in meats (26). Sikhs do 
not smoke or chew tobacco because of religious and cul- 
tural compulsions, and —50% of them are lifelong vegeta- 
rians. Despite the absence of these lifestyle-related risk 
factors, T2D and CAD have reached epidemic proportions 
in Sikhs. Our initial genetic studies in a Sikh cohort as part 
of the Asian Indian Diabetic Heart Study (AIDHS) or the 
Sikh Diabetes Study (SDS) revealed an association of FTO 
and MTNRIB, ADIPOQ, and PPARG polymorphisms with 
T2D and risk factors in the absence of obesity (11,27,28). 
In this investigation, we conducted a GWAS in a relatively 
homogenous Punjabi Sikh population of 1,850 individuals 
and performed multistage replication in up to 27 case- 
control studies of Punjabi, other South Asian, East Asian, 
and Caucasian ancestries (total n = 128,127; 44,358 T2D 
case and 83,769 control subjects) (Supplementary Tables 



1 and 2). Study design of the discovery, replication, and 
meta-analysis phases was optimized to detect new population- 
specific and multiethnic T2D loci (Fig. 1). One important 
difference in the current study from our previous South 
Asian GWAS (22) is that in the previous study, the SNPs 
that were common between South Asians and Europeans 
were selected for replication based on the European Di- 
abetes Genetics Replication and Meta-analysis (DIAGRAM) 
sample. However, in this study, the SNP selection was 
prioritized based on the top signals (P < 10~^) from our 
discovery Sikh cohort. 

RESEARCH DESIGN AND METHODS 
Participants. Participants were part of the Punjabi Sikh GWAS. 
Study sample and characteristics. Our primary Sikh GWAS (discovery) 
cohort used in this investigation is comprised of 1,616 individuals from the 
Punjabi Sikh population that was a part of the AIDHS (also named the SDS). The 
AIDHS/SDS has unique characteristics that are ideal for genetic studies. Sikhs 
are strictly a nonsmoking population, and —50% of participants are teetotalers 
and life-long vegetarians. All individuals for the GWAS discovery cohort were 
recruited from one geographical location. Diagnosis of T2D was confirmed by 
scrutinizing medical records for symptoms and use of medication and mea- 
suring fasting glucose levels according to the guidelines of the American Di- 
abetes Association (29), as described previously (11). Data on lipids, insulin, 
glucose, anthropometric measurements, education, socioeconomic status, job 
grade, diet, and physical activity were available on >95% of the AIDHS/SDS 
individuals selected for this study. Dietary questions involving alcohol con- 
sumption were scored using a scale from 0 to 5; details are described else- 
where (26). T2D is often asymptomatic and remains undiagnosed for many 
years, especially in people from the developing world due to poor healthcare 
provisions. Therefore, it is reasonable to assume that the actual age of onset of 
T2D in Sikhs may range from 39 to 42 years of age compared with the ob- 
served age at diagnosis (46 years). This age is in sharp contrast to the mean 
age at onset of 60 years or above in developed countries (5,26,30). A medical 
record indicating either 1) sl fasting plasma glucose level >7.0 mmolA. (>126 
mg/dL) after a minimum 12-h fast or 2) sl 2-h postglucose level of > 11.1 mmol/L 
(>200 mg/dL) estimated during a 2-h oral glucose tolerance test on more 
than one occasion, combined with symptoms of diabetes, confirmed the di- 
agnosis. Impaired fasting glucose is defined as a fasting blood glucose level 
>5.6 mmol/L (>100 mg/dL) but <7.0 mmol/L (<126 mg/dL). Impaired glucose 
tolerance is defined as a 2-h OGTT >7.8 mmol/L (>140 mg/dL) but <11.1 mmol/L 
(<200 mg/dL). The 2-h OGTTs were performed according to the criteria of the 
World Health Organization (75-g oral load of glucose). BMI was calculated as 
weight (kg)/height (m)^, and waist-to-hip ratio was calculated as the ratio of 
abdomen or waist circumference to hip circumference. Subjects with type 1 
diabetes, or those with a family member with type 1 diabetes, or rare forms of 
T2D subtypes (maturity-onset diabetes of the young) or secondary diabetes 
(from, e.g., hemochromatosis or pancreatitis) were excluded from the study. 
The selection of control subjects was based on a fasting glucose < 100.8 mg/dL 
or a 2-h glucose < 141.0 mg/dL. Subjects with impaired fasting glucose or im- 
paired glucose tolerance were excluded when data were analyzed for associa- 
tion of the variants with T2D. All blood samples were obtained at the baseline 
visits. All participants signed a written informed consent for the investigations. 
The study was reviewed and approved by the University of Oklahoma Health 
Sciences Center Institutional Review Board, as well as the Human Subject 
Protection Committee at the participating hospitals and institutes in India. 
South Asian cohorts. For stage 2a replication, the Sikh component of the 
London Life Sciences Population (LOLIPOP) study (22) comprised 2,919 indi- 
viduals (801 T2D case and 2,018 control subjects). For stage 2b, the non-Sikh 
South Asian components of the LOLIPOP and the Pakistan Risk of Myocardial 
Infarction Study (PROMIS; and the Risk Assessment of Cerebrovascular Events 
[RACE] study) GWAS (22) comprised 29,157 individuals (10,971 case and 18,186 
control subjects) (22). Stage 3a Punjabi-specific replication was carried out on 
2,894 individuals (1,711 case and 1,183 control subjects) of Punjabi ancestry 
from India as part of AIDHS/SDS, and replication testing among South Asians for 
stage 3b was carried out among 10,817 participants (5,157 case and 5,660 control 
subjects), which were part of the following studies: Asian Indians from the 
Singapore Indian Eye (SINDI) study (31), the Chennai Urban Rural Epidemiol- 
ogy Study (CURES) (32), the Diabetes Genetics in Pakistan (DGP) study, the 
UK Asian Diabetes Study (UKADS) (33), and the Sri Lankan Diabetes Study 
(SLDS) (34). Details of the contributing cohorts are provided in the Supplementaiy 
Data 

East Asian cohorts. Replication testing for stage 3c was carried out on a total 
of 33,707 East Asians, comprising 14,890 Japanese from RIKEN (n = 7,480 
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Stage 1 AIDHS/SDS Sikhs (842 cases; 774 controls) 
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Stage 3d DIAGRAM+ European GWAS 
(8,130 cases; 38,987 controls) 



FIG. 1. Summary of study design and outcome of key findings. 



genotyped) and BioBank Japan (n = 7,410 GWAS) (19,35) and 18,817 indi- 
viduals of East Asian ancestry as part of the Asian Genetic Epidemiology 
Network (AGEN) with genotype data available from eight GWAS (21). 
DIAGRAM (Euro-Caucasians). Associations of SNPs with T2D among 
Europeans were tested in silico using results from the genome-wide association 
phase of the DIAGRAM study comprising 47,117 subjects (36). 
Genotyping and quality control. Genomic DNA was extracted from buffy 
coats using QiaAmp blood kits (Qiagen, Chatsworth, CA) or by the salting-out 
procedure (37). Stage 1 genome-wide genotyping was performed using a Hu- 
man 660W-Quad BeadChip panel (lUumina, Inc., San Diego, CA). We per- 
formed pairwise identity-by-state clustering in PLINK across all individuals to 
assess population stratification; no population outliers were detected. Related 
individuals with pi-hat >0.3 and samples with <93% call rate were excluded, 
as were SNPs with call rate <95%. Also excluded were SNPs with Hardy- 
Weinberg equilibrium (HWE) P < 10~^ or minor allele frequency (MAF) <1%. 
After quality control, 524,216 directly genotyped SNPs in 1,616 subjects (842 
case and 774 control subjects) were available for association testing. 

Genotyping for de novo SNPs in the replication samples was performed by 
Sequenom MassArray (BioMark HD MX/HX Genetic Analysis System; Fluidigm) 
or KASPAR (LGC Genomics KBioscience, London, U.K.). Samples and SNPs 
with <95% call rate were excluded, as were those that deviated from HWE at 
P < 10~^. The associations of SNPs with T2D were tested in each cohort 
separately. 
Statistical analyses 

Association testing. Associations of SNPs with T2D were tested using logistic 
regression and an additive genetic model. Age, sex, BMI, and 5 or 10 principal 
components to adjust for residual population stratification were included as 
covariates. As the existing HapMap2 or HapMap3 and 1000 Genomes data do 
not include Sikhs, the 5 or 10 principal components used for this correction 
were estimated using our Sikh population sample and not the HapMap pop- 
ulations. After association analyses, the genomic control inflation factor (X) 
was 1.0, so no adjustments were made (Supplementary Fig. 2A and B). 

In addition to the analysis of directly genotyped SNPs, we performed im- 
putation using the Impute 2 program (38-40), which determines the probability 
distribution of missing genotypes based on a set of known haplotypes and an 
estimated fine-scale recombination map. Imputation was based on the entire 
multiethnic HapMap3 reference panel of —1.5 million autosomal SNPs with 
MAF >1% in 1,011 individuals from Africa, Asia, Europe, and the Americas 
(including 1,362,138 SNPs from the Indian population of 100 Gujaratis from 



Houston [GIH]). Imputation yielded a total of 1,232,008 passing SNPs with 
MAF >1% in the Sikh GWAS. Imputed SNPs were analyzed using SNPTEST 
(38,40), acyusted for the covariates age, sex, BMI, and five principal compo- 
nents, which implements frequentist tests that calculate P values and param- 
eter estimates and their standard errors that account for the uncertainty due to 
the probability distributions of the imputed genotypes, and included only 
those SNPs with an information score >0.5 in the discovery sample as well as 
in all GWAS used for replication, a measure of the relative statistical in- 
formation about the additive genetic effect being estimated. The genomic 
control value for imputed SNPs was 1.02. The inbreeding coefficient and 
measures of autozygosity were determined using the program PLINK. We 
identified runs of homozygosity using the metrics defined in Nails et al. (41), 
evaluating 1-Mb autosomal regions with at least 50 acyacent SNPs, with 
a sliding window of 50 SNPs including no more than 2 SNPs with missing 
genotypes and one possible heterozygous genotype. 

Stage 2 replication. We selected all independent association signals 
(/ <0.25) with P < 10~^ for lookup in GWAS of i) the Sikh component of the 
LOLIPOP GWAS (22) and 2) the non-Sikh South Asian components of the 
LOLIPOP and PROMIS GWAS (22). A fixed-effect, inverse-variance meta- 
analysis (as implemented in METAL) (42) was used to combine the results for 
individual studies. 

Stage 3 replication. Significant association results with P < 10~^ based on 
meta-analysis of stages 1, 2a, and 2b were selected for de novo or in silico 
replication in Sikh, South Asian, other Asian, and European populations. In 
addition, we selected SNPs from a Sikh-only meta-analysis of stages la and 2a 
for genotyping in an in-house Punjabi Sikh T2D case-control population. In our 
previous South Asian GWAS by Kooner et al. (22), 300 of the 3,200 samples of 
the AIDHS/SDS (used in replication) were genotyped using lUumina 660 Quad 
arrays, and the remaining samples (from 1,187 case and 1,632 control sub- 
jects) were genotyped using Sequenom MassARRAYs. However, in this study, 
in addition to GWAS set (ji = 1,616), SNPs were genotyped de novo on our 
remaining replication set (n = 2,894). Signals withP < 10 ~^ after meta-analysis 
of stages 1, 2a, and 3a were also genotyped in the South Asian, other Asian, 
and European populations to test if they were specific to the Sikh ethnic group 
or spanned ethnicities. All meta-analyses were performed using a fixed-effects, 
inverse-variance meta-analysis implemented in METAL. 
MuTHER Consortium. The Multiple Tissue Human Expression Resource 
(MuTHER; www.muther.ac.uk) includes lymphoblastic cell lines and skin and 
adipose tissue derived simultaneously from a subset of well-phenotyped 
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healthy female twins from the Twins UK adult registry. Whole-genome expres- 
sion profiling of the samples, each with either two or three technical replicates, 
was performed using the Illumina HumanHT-12 v3 BeadChips according to the 
protocol supplied by the manufacturer. Log2-transformed expression signals 
were normalized separately per tissue as follows. Quantile normalization was 
performed across technical replicates of each individual followed by quantile 
normalization across all individuals. Genotyping was performed with a combi- 
nation of Illumina arrays HumanHapSOO, HumanHap610Q, IM-Duo, and 
1.2MDUO IM. Untyped HapMap2 SNPs were imputed using the IMPUTE soft- 
ware package (v2). The number of adipose samples with genotypes and ex- 
pression values is 776. Association between all SNPs (MAF >5%; IMPUTE info 
>0.8) within a gene or within 1 Mb of the gene transcription start or end site and 
normalized expression values was performed with the GenABEL/ProbABEL 
packages using the polygenic linear model incorporating a kinship matrix in 
GenABEL followed by the ProbABEL mmscore score test with imputed geno- 
types. Age and experimental batch were included as cofactors. 



RESULTS 

Punjabi Sikh discovery GWAS. Clinical characteristics 
of the stage 1 Punjabi Sikh T2D GWAS cohort and stage 2a 
and 2b (replication) cohorts are described in Supplemen- 
tary Table 3. Principal components analysis revealed little 
population structure (Supplementary Fig. 1). After quality 
control, 524,216 directly genotyped SNPs in 1,616 subjects 
(842 case and 774 control subjects) from 1,850 total sub- 
jects were available for association testing after removing 
samples showing cryptic relatedness through identity- 
by-descent sharing. To increase genome coverage, geno- 
types were imputed for untyped SNPs using the HapMapS 
multiethnic reference panel (see research design and 
methods), yielding a total of 1,232,008 SNPs for association 
analyses. The reason for choosing a more cosmopolitan 
panel and not restricting to the GIH was based on our own 
data showing equal diversity of the Sikhs from GIH and 
CEU, and based on previously described advantages of 
using a worldwide reference panel (39). We performed 
a GWAS for T2D adjusted for covariates age, sex, BMI, and 
five principal components (Supplementary Fig. 1); no evi- 
dence of inflation was observed (Supplementary Fig. 2A 
and B) (see research design and methods). 
Replication and meta-analyses in Punjabi Sikh 
participants. We undertook a two-stage replication in 
T2D case-control samples of Punjabi Sikh ancestry (stages 
2a and 3a in Fig. 1). Lead SNPs representing 513 novel, 
independent (r^ <0.25) association signals withP < 10~^ in 
the discovery GWAS (including only two previously known 
GWAS SNPs from TCF7L2 and IGF2BP2 and excluding 62 
SNPs with P < 10"^ from other known T2D loci) were 
tested for in silico replication in the Punjabi Sikh sub- 
component of the LOLIPOP GWAS comprising 801 T2D 
case and 2,018 control subjects (Supplementary Table 1). 
Top SNPs representing 66 putatively novel signals with 
P < 10"^ after stage 1 and 2a meta-analysis using a flxed 
effects, inverse-variance approach were directly geno- 
typed in the stage 3a sample of 2,894 Punjabi Sikh indi- 
viduals (1,711 T2D case and 1,183 control subjects) (Fig. 1 
and Supplementary Table 2). 

In a combined meta-analysis of the three Punjabi studies 
(n = 7,329), we identified one new locus reaching genome- 
wide significance (P < 5 X 10"^) along with robust repli- 
cation of the established SNP rs7903146 in TCF7L2 (P = 
3.32 X 10"^^) in Sikhs (Figs. 2, 3, and 4). This novel as- 
sociation signal lies in a 164-kb region of strong LD at 
13ql2 (harboring genes gamma-sarcoglycan [SGCG] and 
sacsin [SACS]) and is represented by a directly genotyped 
intronic SNP, rs9552911 in SGCG (odds ratio [OR] 0.67 
[95% CI 0.58-0.77], P = 1.82 X 10"^ for the minor "A" allele) 



(Table 1, Fig. 4, and Supplementary Table 5). Excluding 
BMI from the logistic regression model did not affect the 
association (Supplementary Table 6). Furthermore, in- 
cluding five additional principal components in the model 
did not attenuate the signal; indeed, the effect and signif- 
icance were slightly improved (Supplementary Table 6). 
The genetic variance (R ) explained by this variant for the 
T2D phenotype in Punjabi Sikh discovery and replication 
sets was 1.57 and 1.34%, respectively. There were 15 ad- 
ditional independent loci with suggestive evidence (P < 
10~^ to < 10~^) of association, including six unknown 
regions along with IGF2BP2, originally identified in Cau- 
casians (43) (Supplementary Table 5). Meta-analysis 
results including non-Sikh Punjabis from PROMIS (Paki- 
stan) revealed suggestive association (P < 10~^ to < 10~^) 
at SNPs from three new regions: chromosome 18q21 
ZBTB7C (rsl893835), 20ql3, near HMGILI/CTCFL/ 
RBM38/PCK1 (rs328506), and 5q33 (rsl7053082) (Supple- 
mentary Table 7). Association results for 42 previously 
reported T2D loci in the Punjabi cohort are summarized in 
Supplementary Table 14. Most loci showed consistent ef- 
fect in the same direction and 33 out of 42 were associated 
with T2D at P < 0.05 in Sikhs. 

Replication/evaluation and meta-analysis in other 
South Asians. In order to identify T2D association sig- 
nals common to Punjabi and other South Asian pop- 
ulations, we tested the association of the 513 top 
independent signals (P < 10~^) derived from the discovery 
cohort in GWAS from the LOLIPOP, PROMIS, and RACE 
studies as part of stage 2b replication (10,971 T2D case and 
18,186 control subjects) (Fig. 1 and Supplementary Table 
1). Thirty-one signals (P < 10~^ from an interim analysis 
with stage 2b) were further genotyped in 10,817 South 
Asians (5,157 T2D and 5,660 control subjects) (Fig. 1) as 
part of stage 3b replication. Clinical characteristics of the 
stage 3 replication cohorts are described in Supplementary 
Table 4. Combined South Asian meta-analysis revealed 
nominally significant association in six SNPs with MAF 
>5% (P < 10~^), but only the two previously known SNPs 
in TCF7L2 and IGF2BP2 reached genome-wide signifi- 
cance (Table 1 and Supplementary Table 8). Suggestive 
novel signals included SNPs at chromosome 20ql3, near 
HMG1L1/CTCFL/RBM38/PCK1 (rs328506), 7q32 near 
PLXNA4 (rsl593304), 3p21 in SCAP (rs4858889), and 5pll 
(rsl3155082) (Supplementary Table 8). Further studies 
and replication in a larger sample will be required to val- 
idate these results and identify causal variants at these 
loci. 

Multiethnic replication and meta-analysis. To identify 
T2D signals spanning ethnicities, we extended the replica- 
tion of 31 SNPs with P < 10"^ in Punjabis and South Asians 
(stage 3b) to East Asians (AGEN+) and Europeans (DIA- 
GRAM+) in stages 3c and 3d, respectively (Fig. 1). Upon 
meta-analysis of 31 loci in Asians (South Asians and AGEN 
+), genome-wide associations were only seen in TCF7L2 
(rs7903146, P = 1.93 X 10"^^) and IGF2BP2 (rsl470579, P = 
1.54 X 10 ^^) (Supplementary Table 9). In joint multiethnic 
meta-analysis on 128,127 individuals from 27 studies, only 
two previously known loci, TCF7L2 (rs7903146, P = 8.53 X 
10"'^) and IGF2BP2 (rsl470579, P = 1.81 X 10"^^), showed 
robust associations. Interestingly, none of the Punjabi hits 
could be independently confirmed in AGEN+ or DIAGRAM+ 
(notably, the lead rs9552911 variant from SGCG was 
monomorphic in DIAGRAM+) (Table 1 and Supplementary 
Table 10). Lookup of 50 kb upstream and downstream of 
SNPs within the SGCG locus in the publicly available data of 
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FIG. 2. A: Manhattan plot showing primary genome- wide association analysis of the Punjabi Sikh discovery cohort using directly geno typed 
(524,216) SNPs. B: Manhattan plot shows imputed 1,232,008 SNPs on the jc-axis and -logio P value of association on the ^-axis. Locations of the 
three loci (including one novel locus at SGCG) reached genome-wide significance after combined analysis of the GWAS and replication data in 
Punjabi Sikhs. 



the Meta-Analyses of Glucose and Insulin-Related Traits 
Consortium (MAGIC) study on glycemic trait GWAS (44,45) 
revealed several nominal associations of SNPs with fasting 
blood glucose and 2-h glucose levels (Supplementary Fig. 3). 
Some of these SNPs also showed an association with fasting 
blood glucose and waist or waist-to-hip ratio in Sikhs 
(Supplementary Table 11), but none of these were in LD 
(r^ >0.20) with our lead SNP. 



Gene expression studies. We examined the expression 
of SGCG and neighboring genes (FLJ46358, MIPEP, 
SACS, and sTNFRSFlO) within 1 Mb of the index SNP by 
CIS expression quantitative trait locus (eQTL) analysis us- 
ing adipose tissue, skin, and lymphoblastic cell line gene 
expression data from the MuTHER Consortium, compris- 
ing healthy female twins of European ancestry from Brit- 
ain. Several SNPs in the SGCG region were associated with 
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FIG. 3. A: Regional association plot for a new T2D locus detected at 13ql2 in the SGCG gene from the genome-wide meta-analysis in Sikhs. Bi A 
strong confirmation of SNPs in the TCF7L2 gene in Sikh meta-analysis. In these plots, the SNPs showing the most strongly associated signal are 
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note that there could still be differential LD between the reference panel and the Sikh population. At the bottom of the plot, the locations of known 
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significantly elevated (Pgqtl < 10~^ to < 10~^) expression 
of SGCG mRNA in adipose tissues (Supplementary Table 
12 and Supplementary Fig. 4). One adipose eQTL from 
MuTHER (rs572303, PeQTL = 5.47 X 10"^) located within 
SGCG showed a nominally significant association with 
increased waist circumference in Sikhs ((3 = 0.67, P = 5.2 X 
10~^) (Supplementary Table 11). As shown in Supple- 
mentary Fig. 4, the LD patterns in the region (—1.46 Mb) 



surrounding the SGCG variant (rs9552911) varied in East 
Asians (JPT), Africans (YRI), Caucasians (CEU), Gujarati 
Indians (GIH), and Sikhs. Interestingly, in Caucasians 
and Yorubians, this variant was monomorphic. However, 
several alternative SNPs from this region in Europeans 
were nominally associated with fasting blood glucose 
(MAGIC study, ranging from 0.10 to 0.20 with the index 
SNP [rs9552911]) and mRNA expression of adipose cells in 
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the MuTHER study (/ ranging from 0.14 to 0.26 with the 
index SNP). These data suggest that population differ- 
ences may underhe the weak LD. It is possible that a single 
causal variant may be responsible for these associations, 
but LD may differ between Sikhs, Europeans, and other 
populations. 

Comparative analysis of autozygosity. We further 
looked to compare the distributions of inbreeding coef- 
ficients and autozygosity as described by Nails et al. (41). 
As expected, the inbreeding coefficients in our sample 
were higher compared with two outbred populations of 
European Americans, Coriell, and Baltimore Longitudinal 
Study of Aging (BLSA) {F = 0.041 ± 0.018 in Sikhs vs. F = 
0.007 ± 0.019 in Coriell and F = -0.3 ± 0.012 in BLSA), as 
assessed by Nails et al. (41). However, these results were 
similar to other Indian populations previously reported by 
Reich et al. (46). No significant difference in inbreeding 
was observed between case and control subjects (P = 
0.59). Autozygosity analysis determined that there were 
19 ± 7 homozygous segments >1 Mb in length, with an 
average length of 2.0 ± 0.95 Mb. Hence, fewer but longer 
autozygous segments were found in our population than in 
outbred populations. No correlation of measures of 



autozygosity to age was observed (P > 0.05) across dec- 
ades of age. 



DISCUSSION 

In this GWAS and multistage meta-analysis, a novel locus 
at 13ql2 in the SGCG gene (rs9552911, P = 1.82 X 10"^) 
was identified as associated with T2D susceptibility in 
Punjabi Sikhs from Northern India. 

SGCG is a member of the sarcoglycan complex of 
transmembrane glycoproteins mutated in autosomal re- 
cessive muscular dystrophy, in particular limb-girdle mus- 
cular dystrophy type 2C (LGMD2C). SGCG is expressed in 
skeletal muscle, and its high expression is also seen in 
vascular smooth muscle cells as well as in breast cancer cell 
lines (47,48). Founder mutations in SGCG that cause 
LGMD2C predate migration of the Romani gypsies of 
Europe out of India around 1100 AD (49). Due to complete 
endogamy, this genetically isolated community had an in- 
creased incidence of autosomal recessive LGMD2C. SGCG- 
targeted knockout mice displayed a variety of phenotypes, 
including dystrophic cardiomyopathy and defects in skele- 
tal muscle, metabolism, homeostasis, growth, apoptosis, 
aging, and behavior (50-53). Mice lacking the sarcoglycan 
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complex including SGCG in adipose and skeletal muscle 
were shown to be glucose intolerant and exhibited whole- 
body insulin resistance due to impaired insulin-stimulated 
glucose uptake in skeletal muscle (54). 

The allelic distribution of the less common "A" (pro- 
tective) allele of rs9552911 ranged from 0.06 to 0.15 in South 
Asians and differed between other South Asians (0.11) and 
Punjabi Sikhs (0.08) (see details in Supplementary Table 
13). Further replication in large independent datasets of 
South Asians and Punjabi Sikhs would be needed to confirm 
the pattern of observed association. In view of the complex 
racial history complicated by a well-defined caste system, 
Indian populations display a great deal of genetic and cul- 
tural diversity (55). Studies suggest that genetic affinity 
among endogamous communities in India is inversely cor- 
related with geographic distance between them (23). 
Therefore, it is possible that undetected causal variant(s) or 
multiple rare variants in LD with this marker arose on a hap- 
lotype tagged by rs9552911 in Punjabi Sikhs after divergence 
from other South and West Indian populations. This variation 
in the index SNP rs9552911 does not appear to be of re- 
cent origin, as suggested by comparative genomic analysis 
(Supplementary Fig. 5). Two important nuclear hormone 
receptors and transcription factors (peroxisome proliferator- 
activated receptor-7 [PPAR-7] [1 and 2] and PPAR-a) bind to 
the promoter and intron 1 of the SGCG gene. Further, the 
maturity-onset diabetes of young 4 (M0DY4) locus at chro- 
mosome 13ql2, represented by insulin promoter factor 1 or 
PDX-1, lies next to the SGCG locus. Therefore, further in- 
depth examination by targeted resequencing in the extended 
region and functional studies may reveal putative causative 
variants in this extended region and provide insight into the 
physiological relevance of the observed association. 

In summary, our study identified a novel locus associated 
with T2D in a population of Punjabi Sikh ancestry from 
Northern India. These findings not only provide new in- 
formation on previously unknown regions associated with 
T2D but demonstrate a putative population-specific associ- 
ation that could lead to additional biological insights into 
T2D pathogenesis. 
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